====== MediaWiki to DokuWiki Converter ====== ===== Automatic script ===== This script will automatically convert a MediaWiki install to DokuWiki. No configuration is required as all it needs is the path to ''LocalSettings.php''. The shell script, as was presented in sections below, did not work as expected as DokuWiki did not find the pages after they were injected manually. Instead, this script uses DokuWiki's own API to programmatically insert pages from MediaWiki. The ''mw2dw-conv_sed.sh'' script, which you can find below, has been converted into native PHP so shell access is not required. Also runs from the web server if desired. Find on GitHub: \\ https://github.com/tetsuo13/MediaWiki-to-DokuWiki-Importer ===== yamdwe tool ===== "yamdwe" (Yet Another Mediawiki to DokuWiki Exporter) is another export tool. https://github.com/projectgus/yamdwe/ **Pros**: * It uses the MediaWiki API so can create a local DokuWiki from a remote MediaWiki install. Automatically imports full revision history for each page, imports media and (optionally) imports users & passwords if it has database access. * Uses [[http://mwlib.readthedocs.org/|mwlib]] to parse MediaWiki syntax so complex pages import more cleanly. **Cons**: * Is Python based so it's more involved to set up than most of the other tools listed here (install steps for Linux are provided.) * Slow for large amounts of Mediawiki content * Categories are not converted (the content within the category pages) ===== Older scripts: ===== ===== Updated Scripts for Converting Mediawiki 1.15.1 to Anteater. ===== Thanks to all of the great scripts on here I was able to do a bit more work towards making something a bit more automated. It's not perfect, but it converted over my 200 page site + files to Dokuwiki pretty well. I figured at this point any other little bugs I find I can change by hand. I didn't have time to perfect it - but hopefully someone else does. Here's the download: http://www.passportparking.info/Download/mediawiki2dokuwiki2.tar.gz As I said, it's got some bugs still. You need to run this on your actual mediawiki/dokuwiki server as it copies files between the two file locations. 4/16/2011 - Charlie Youakim (charlie.youakim@passportparking.com) Bugfix! UPPER to lower convert charset, change getContent.sh: # lowertitle="$(echo $title | tr "[:upper:]" "[:lower:]").txt" lowertitle="$(echo $title | awk '{print tolower(substr($0,0))}').txt" ===== Web Based Version ===== Hey, I was playing with AWK and Perl a little bit. I created a MediaWiki to DokuWiki Converter. A online converter is now hosted at [[http://johbuc6.coconia.net/mediawiki2dokuwiki.php]].FIXME ==== Requirements ==== * bash * Perl ==== Capabilities ==== It is able to transform * Links * Bold/italic text * Lists * Talkings * Code ==== Limitations ==== * One Page at once ==== Missing features (yet) ==== It is **not** able to transform * tables * CODE (that is, different lines of text starting with a space), should be surrounded by in DokuWiki ==== Bugs ==== * [text hello] is converted into a link [[text|hello]] but it should just stay like that * '''''IMPORTANT!!!''''' is converted into **//IMPORTANT!!!**// but should be //**IMPORTANT!!!**// or **//IMPORTANT!!!//** * //server/share is not converted, but since // opens italic font, this line should be translated to //server/share ==== Source ===== File mediawiki2dokuwiki.sh: #! /bin/sh # Mediawiki2Dokuwiki Converter # originally by Johannes Buchner # License: GPL (http://www.gnu.org/licenses/gpl.txt) # Headings cat mediawiki | \ perl -pe 's/^[ ]*=([^=])/

${1}/g' | \ perl -pe 's/([^=])=[ ]*$/${1} <\/h1>/g' | \ perl -pe 's/^[ ]*==([^=])/

${1}/g' | \ perl -pe 's/([^=])==[ ]*$/${1} <\/h2>/g' | \ perl -pe 's/^[ ]*===([^=])/

${1}/g' | \ perl -pe 's/([^=])===[ ]*$/${1} <\/h3>/g' | \ perl -pe 's/^[ ]*====([^=])/

${1}/g' | \ perl -pe 's/([^=])====[ ]*$/${1} <\/h4>/g' | \ perl -pe 's/^[ ]*=====([^=])/

${1}/g' | \ perl -pe 's/([^=])=====[ ]*$/${1} <\/h5>/g' | \ perl -pe 's/^[ ]*======([^=])/
${1}/g' | \ perl -pe 's/([^=])======[ ]*$/${1} <\/h6>/g' \ > mediawiki1 cat mediawiki1 | \ perl -pe 's/<\/?h1>/======/g' | \ perl -pe 's/<\/?h2>/=====/g' | \ perl -pe 's/<\/?h3>/====/g' | \ perl -pe 's/<\/?h4>/===/g' | \ perl -pe 's/<\/?h5>/==/g' | \ perl -pe 's/<\/?h6>/=/g' | \ cat > mediawiki2 # lists cat mediawiki2 | perl -pe 's/^[\*#]{4}\*/ * /g' | \ perl -pe 's/^[\*#]{3}\*/ * /g' | \ perl -pe 's/^[\*#]{2}\*/ * /g' | \ perl -pe 's/^[\*#]{1}\*/ * /g' | \ perl -pe 's/^\*/ * /g' | \ perl -pe 's/^[\*#]{4}#/ \- /g' | \ perl -pe 's/^[\*\#]{3}\#/ \- /g' | \ perl -pe 's/^[\*\#]{2}\#/ \- /g' | \ perl -pe 's/^[\*\#]{1}\#/ \- /g' | \ perl -pe 's/^\#/ - /g' | \ cat > mediawiki3 #[link] => [[link]] cat mediawiki3 | perl -pe 's/([^\[])\[([^\[])/${1}[[${2}/g' | perl -pe 's/^\[([^\[])/[[${1}/g' | perl -pe 's/([^\]])\]([^\]])/${1}]]${2}/g' | perl -pe 's/([^\]])\]$/${1}]]/g' \ > mediawiki4 #[[url text]] => [[url|text]] cat mediawiki4 | perl -pe 's/(\[\[[^| \]]*) ([^|\]]*\]\])/${1}|${2}/g' \ > mediawiki5 # bold, italic cat mediawiki5 | perl -pe "s/'''/**/g" | perl -pe "s/''/\/\//g" \ > mediawiki6 # talks cat mediawiki6 | perl -pe "s/^[ ]*:/>/g" | perl -pe "s/>:/>>/g" | perl -pe "s/>>:/>>>/g" | perl -pe "s/>>>:/>>>>/g" | perl -pe "s/>>>>:/>>>>>/g" | perl -pe "s/>>>>>:/>>>>>>/g" | perl -pe "s/>>>>>>:/>>>>>>>/g" \ > mediawiki7 cat mediawiki7 | perl -pe "s/
//g" |
  perl -pe "s/<\/pre>/<\/code>/g" \
  > mediawiki8

cat mediawiki8 > dokuwiki


==== Howto use (for shell newbies) ====
  - Make sure you are under Linux/Unix ;-)
  - Save the code above in a file named ''mediawiki2dokuwiki.sh''.
  - Save the MediaWiki page you want to transform to a file called ''mediawiki'' in the same directory.
  - In the shell go to the directory (using ''cd'') and execute: 
chmod +x mediawiki2dokuwiki.sh #we want to be able to execute it
./mediawiki2dokuwiki.sh

  - Now you got some files called mediawiki+a number. These are debugging steps (ignore).
  - In the file '''dokuwiki''' you'll find your DokuWiki-Syntax.

Remember, all the fame goes to me, cause I started this ;-)! --- //[[buchner.johannes@gmx.at|Johannes Buchner]] 2006-01-26 19:27//
==== Changelog ====
  * added Talkings --- //[[buchner.johannes@gmx.at|Johannes Buchner]] 2006-01-27 11:37//

==== ToDo ====
Would be great if someone wants to improve this! What is needed:
  * See Limitations & Missing features
  * A web-based service: now available: [[http://johbuc6.coconia.net/mediawiki2dokuwiki.php]]. (This is no longer active! <- wrong, it still is!)

==== Feedback & discussion ====
  * Yeah, i could remove leading whitespaces at all, but this might ruin the design. //[[buchner.johannes@gmx.at|Johannes Buchner]] 2006-01-27 11:41//
  * I just began to implement tables with: 
  perl -pe "s/^[ ]*\{\|[^\|]*$//g" |
  perl -pe "s/^[ ]*\|\}[ ]*$//g" |
  perl -pe "s/^[ ]*\|([^\|]+)\|[ ]*$/|${1}|/g" 
  # ...
 But [[http://meta.wikimedia.org/wiki/Help:Table|tables in Wikimedia]] are crap. I suggest to use a html2wiki implementation for this need ('cause the needs will be very special) like [[http://diberri.dyndns.org/html2wiki.html]] //[[buchner.johannes@gmx.at|Johannes Buchner]] 2006-01-27 12:00//
  * Feedback/Questions:
    * Feedback by Juergen Mueller: Is there an easy way to convert a bunch of MediaWiki articles to DokuWiki articles at once? Our MediaWiki wiki has some hundreds of articles and therefore it is not feasible to do it manually file by file.
=== Code markup ===
Graham Macleod 30/1/08 1:50pm GMT - Hi there. Code mark up which displays such as
  
in DokuWiki uses double spaces but your converter seems to keep the single space that MediaWiki uses. I'd also just like to take the time to give you massive props on this. It's been a life saver.

=== sed Version ===
  * I changed this script to use sed, and have made a few improvements (see bugs & missing features). Maybe this can be of use for someone.
#! /bin/sh
# Mediawiki2Dokuwiki Converter
# originally by Johannes Buchner 
# changes by Frederik Tilkin:		- uses sed instead of perl
#				- resolved some bugs ('''''IMPORTANT!!!''''' becomes //**IMPORTANT!!!**//, // becomes // if it is not in a CODE block)
# 				- added functionality (multiple lines starting with a space become CODE blocks)
#
# Licence: GPL (http://www.gnu.org/licenses/gpl.txt)

# First escape things that are already DokuWiki but not MediaWiki syntax
# //	=>	// 	(only when it is NOT in a PREFORMATTED line, and when it is NOT in a LINK [] !)
# **	=>  ** so that it's correctly converted to DokuWiki  blocks later on

cat mediawiki \
	| sed -r -n '
		#starts with a SPACE, so it is part of a code block, just print and do nothing
		/^[ ]/ { p; d }
		#else: replace ALL **... strings (not at beginning of line)
		s/([^^][^\*]*)(\*\*+)/\1\2<\/nowiki>/g
		# 		also replace ALL //... strings 
		s/([^\/]*)(\/\/+)/\1\2<\/nowiki>/g
		#		change the ones that have been replaced in a link [] BACK to normal (do it twice in case [http://addres.com http://address.com] ) [quick and dirty]
		s/([\[][^\[]*)()(\/\/+)(<\/nowiki>)([^\]]*)/\1\3\5/g ; s/([\[][^\[]*)()(\/\/+)(<\/nowiki>)([^\]]*)/\1\3\5/g
		
		p
	  ' \
	| sed -r -n '
		# See also: http://www.grymoire.com/Unix/Sed.html#uh-40
		# 	http://en.wikipedia.org/wiki/Regular_expression
		# This is pretty advanced sed syntax, so I ll try to explain as much as possible
		################################################################################
		
		# if line starts with a space, add it to the hold buffer
		# we do this by 'branching' to :addtopre
		/^ [ ]*[^ ][^ ]*/ b addtopre
		# if line has only whitespace or is empty, the preformatted block is over, so we surround that with 
		# we do this by 'branching' to :outputpre
		/^[ ]*$/ b outputpre
		# if line starts with NO whitespace, the preformatted block is over, so we surround that with 
		/^[^ ].*$/ b outputpre
				
		#else this is a normal line
				#s/(.*)/NORMAL LINE: \1/g; p
			# print the line
			p
			#delete the current pattern space (so new cycle is started -> jumps to top)
			d
		
		# this is a line that should be part of a CODE block
		:addtopre
			#add it to the hold buffer
			H
				#s/(.*)/ADDED LINE: \1/g; p
			# if this is the last line of the file (end-of-file), empty this line and then output this last preformatted block
			$ { s/.*//g
				b outputpre
			}
			#delete the current pattern space (so new cycle is started -> jumps to top)
			d
		# this is where a paragraph is surrounded by 

		:outputpre
				#s/(.*)/END OF CODE LINE: \1/g; p
			# HOLD buffer is exchanged with the pattern space
			x

			# IF not empty, surround with 
 and PRINT the pattern space
			/(.+)/ {
				# surround it with 
				s/(.+)/
\1<\/pre>/g
				p
			}
			# exchange pattern space and hold buffer again, pattern is now the current line (not part of the preformatted block) and PRINT this line
			x
			p
			#delete the current pattern space			
			s/.*//g
			#and exchange this again with the hold buffer, so that the hold buffer is empty again			
			x
			#delete the current pattern space (so new cycle is started -> jumps to top)
			d
	' \
    > mediawiki0

# Headings
cat mediawiki0 \
   | sed -r 's/^[ ]*=([^=])/

\1/g' \ | sed -r 's/([^=])=[ ]*$/\1 <\/h1>/g' \ | sed -r 's/^[ ]*==([^=])/

\1/g' \ | sed -r 's/([^=])==[ ]*$/\1 <\/h2>/g' \ | sed -r 's/^[ ]*===([^=])/

\1/g' \ | sed -r 's/([^=])===[ ]*$/\1 <\/h3>/g' \ | sed -r 's/^[ ]*====([^=])/

\1/g' \ | sed -r 's/([^=])====[ ]*$/\1 <\/h4>/g' \ | sed -r 's/^[ ]*=====([^=])/

\1/g' \ | sed -r 's/([^=])=====[ ]*$/\1 <\/h5>/g' \ | sed -r 's/^[ ]*======([^=])/
\1/g' \ | sed -r 's/([^=])======[ ]*$/\1 <\/h6>/g' \ > mediawiki1 cat mediawiki1 \ | sed -r 's/<\/?h1>/======/g' \ | sed -r 's/<\/?h2>/=====/g' \ | sed -r 's/<\/?h3>/====/g' \ | sed -r 's/<\/?h4>/===/g' \ | sed -r 's/<\/?h5>/==/g' \ | sed -r 's/<\/?h6>/=/g' \ > mediawiki2 # lists cat mediawiki2 \ | sed -r 's/^[*#][*#][*#][*#]\*/ * /g' \ | sed -r 's/^[*#][*#][*#]\*/ * /g' \ | sed -r 's/^[*#][*#]\*/ * /g' \ | sed -r 's/^[*#]\*/ * /g' \ | sed -r 's/^\*/ * /g' \ | sed -r 's/^[*#][*#][*#][*#]#/ - /g' \ | sed -r 's/^[*#][*#][*#]#/ - /g' \ | sed -r 's/^[*#][*#]#/ - /g' \ | sed -r 's/^[*#]#/ - /g' \ | sed -r 's/^#/ - /g' \ > mediawiki3 #[url text] => [url|text] cat mediawiki3 \ | sed -r 's/([^[]|^)(\[[^] ]*) ([^]]*\])([^]]|$)/\1\2|\3\4/g' \ > mediawiki4 #[link] => [[link]] cat mediawiki4 \ | sed -r 's/([^[]|^)(\[[^]]*\])([^]]|$)/\1[\2]\3/g' \ > mediawiki5 # bold, italic cat mediawiki5 \ | sed -r "s/'''''(.*)'''''/\/\/**\1**\/\//g" \ | sed -r "s/'''/**/g" \ | sed -r "s/''/\/\//g" \ > mediawiki6 # talks cat mediawiki6 \ | sed -r "s/^[ ]*:/>/g" \ | sed -r "s/>:/>>/g" \ | sed -r "s/>>:/>>>/g" \ | sed -r "s/>>>:/>>>>/g" \ | sed -r "s/>>>>:/>>>>>/g" \ | sed -r "s/>>>>>:/>>>>>>/g" \ | sed -r "s/>>>>>>:/>>>>>>>/g" \ > mediawiki7 cat mediawiki7 \ | sed -r "s//\'\'/g" \ | sed -r "s/<\/code>/\'\'/g" \ > mediawiki8 cat mediawiki8 \ | sed -r "s/
//g" \
   | sed -r "s/<\/pre>/<\/code>/g" \
  > mediawiki9

#100720-MSe: remove "<\code>\n \n"
cat mediawiki9 \
   | sed 'N;N;s/<\/code>\n[ \t]*\n//;P;D;D;' \
  > mediawiki10

#cat mediawiki10 > dokuwiki

# font (color, ...)
cat mediawiki10 \
  | sed 's///g' \
  | sed 's/<\/span>/<\/font>/g' \
  | sed 's///g' \
 > mediawiki11

cat mediawiki11 > dokuwiki



There is also one issue, when bold and italic texts are combined. I tested with the German UNIX Wikipedia article and there were 2 tags that made whole parts of the generated DokuWiki in bold.
The following code fixes this behaviour:


$ diff mediawiki2dokuwiki.sh mediawiki2dokuwiki.sh.080925-1
7d6
< # changes by Reiner Rottmann: - fixed erroneous interpretation of combined bold and italic text.
165,169c164
<
< cat mediawiki9 \
<    | sed -r "s/\*\*\/\//\/\/\*\*/g"> mediawiki10
<
< cat mediawiki10 > dokuwiki
---
> cat mediawiki9 > dokuwiki


===== Automatic script =====
This script get the contents of your mediawiki (by database connection), and convert it to dokuwiki syntax.
All, out of the box, you only need to configure user/password of database.

Example:

cd mediawiki2dokuwiki
./getContent.sh
mv old /var/www/dokuwiki/data/pages
chmod a+r /var/www/dokuwiki/data/pages/old -R



Here you can find the package: http://dabax.net/files/mediawiki2dokuwiki.tar.gz (this link is broken as of 9/3/2012)

This is the main script.

#!/bin/bash

#About your mediawiki
WIKIDB="DATABSE_NAME"
WIKIPASS="DATABASE_PASSWORD"

#The destination folder
DEST="old"

#Dont touch this
TITLES="titles"
PHARSER="./m2d.sh"

mysql --password=$WIKIPASS $WIKIDB -e 'select cur_title from cur;' | \
while read title; do

	newtitle="$(echo $title | tr "[:upper:]" "[:lower:]").txt" 
	echo "$newtitle"
	mysql --password=$WIKIPASS $WIKIDB -e "select cur_text from cur where cur_title='$title';" \
	| sed s/'\\n'/\\n/g | grep -v cur_text |  $PHARSER $DEST/$newtitle 

done

for f in $DEST/*; do 
	[ $(cat $f | wc -w) -lt 25 ] && \
	{ echo "Deleting $f, too short"; rm -f $f;}
done



echo ""
echo "Done. Put the contents of $DEST to Path_Of_dokuwiki/data/pages/"


m2d.sh is the "sed version" published in this page, with some modifications.
Enjoy it!

==== ERROR 1146 (42S02) at line 1: Table 'dbname.cur' doesn't exist ====
Hi,

Using the above attached code, I'm having some problems:


ERROR 1146 (42S02) at line 1: Table 'pswiki.cur' doesn't exist
cat: old/*: No such file or directory
Deleting old/*, too short

Done. Put the contents of old to Path_Of_dokuwiki/data/pages/

(where pswiki is the database containing my MW data, which came from a Windows-based MySQL server via mysqldump.) The directory ./old is empty.

If I turn on MySQL statement logging:

SET GLOBAL general_log_file='/var/log/mysql/sql.log';
SET GLOBAL general_log='ON';

then I get the following only in that log file when I run ./getContent.sh:

110216  2:11:17    75 Connect   root@localhost on pswiki
                   75 Query     select @@version_comment limit 1
                   75 Query     select cur_title from cur
                   75 Quit


It looks to me as if what's intended to be a cursor is being interpreted as a literal table name, but I'm getting out of my depth there. I have tried back-ticking `cur` in getContent.sh to no avail.

Can anyone shed light?

Thanks!

 --- [[user>tomgreen|tomgreen]] //2011/02/16 03:06//

EDIT: I've used the web-based service linked above, and apart from some table funnies it worked very well - a lifesaver. Thanks!

 --- [[user>tomgreen|tomgreen]] //2011/02/18 03:26//

I have solved the "cur problem". It is assumed, for my opinion, as a VIEW.
Create this view, named "cur", with the following SQL request :


CREATE VIEW cur AS SELECT mw_page.page_title AS cur_title, mw_text.old_text AS cur_text
FROM mw_page,mw_text WHERE mw_page.page_id=mw_text.old_id;


and re-run the shell.

Hope this helps!

 --- [[user>gtournat|gtournat]] //2011/11/06 13:16//


====== Mediawiki 2 Dokuwiki Converter ======


#! /bin/sh
# Mediawiki2Dokuwiki Converter
# originally by Johannes Buchner 
# License: GPL (http://www.gnu.org/licenses/gpl.txt)

# Headings
cat mediawiki | \
   perl -pe 's/^[ ]*=([^=])/

${1}/g' | \ perl -pe 's/([^=])=[ ]*$/${1} <\/h1>/g' | \ perl -pe 's/^[ ]*==([^=])/

${1}/g' | \ perl -pe 's/([^=])==[ ]*$/${1} <\/h2>/g' | \ perl -pe 's/^[ ]*===([^=])/

${1}/g' | \ perl -pe 's/([^=])===[ ]*$/${1} <\/h3>/g' | \ perl -pe 's/^[ ]*====([^=])/

${1}/g' | \ perl -pe 's/([^=])====[ ]*$/${1} <\/h4>/g' | \ perl -pe 's/^[ ]*=====([^=])/

${1}/g' | \ perl -pe 's/([^=])=====[ ]*$/${1} <\/h5>/g' | \ perl -pe 's/^[ ]*======([^=])/
${1}/g' | \ perl -pe 's/([^=])======[ ]*$/${1} <\/h6>/g' \ > mediawiki1 cat mediawiki1 | \ perl -pe 's/<\/?h1>/======/g' | \ perl -pe 's/<\/?h2>/=====/g' | \ perl -pe 's/<\/?h3>/====/g' | \ perl -pe 's/<\/?h4>/===/g' | \ perl -pe 's/<\/?h5>/==/g' | \ perl -pe 's/<\/?h6>/=/g' | \ cat > mediawiki2 # lists cat mediawiki2 | perl -pe 's/^[\*#]{4}\*/ * /g' | \ perl -pe 's/^[\*#]{3}\*/ * /g' | \ perl -pe 's/^[\*#]{2}\*/ * /g' | \ perl -pe 's/^[\*#]{1}\*/ * /g' | \ perl -pe 's/^\*/ * /g' | \ perl -pe 's/^[\*#]{4}#/ \- /g' | \ perl -pe 's/^[\*\#]{3}\#/ \- /g' | \ perl -pe 's/^[\*\#]{2}\#/ \- /g' | \ perl -pe 's/^[\*\#]{1}\#/ \- /g' | \ perl -pe 's/^\#/ - /g' | \ cat > mediawiki3 #[link] => [[link]] cat mediawiki3 | perl -pe 's/([^\[])\[([^\[])/${1}[[${2}/g' | perl -pe 's/^\[([^\[])/[[${1}/g' | perl -pe 's/([^\]])\]([^\]])/${1}]]${2}/g' | perl -pe 's/([^\]])\]$/${1}]]/g' \ > mediawiki4 #[[url text]] => [[url|text]] cat mediawiki4 | perl -pe 's/(\[\[[^| \]]*) ([^|\]]*\]\])/${1}|${2}/g' \ > mediawiki5 # bold, italic cat mediawiki5 | perl -pe "s/'''/**/g" | perl -pe "s/''/\/\//g" \ > mediawiki6 # talks cat mediawiki6 | perl -pe "s/^[ ]*:/>/g" | perl -pe "s/>:/>>/g" | perl -pe "s/>>:/>>>/g" | perl -pe "s/>>>:/>>>>/g" | perl -pe "s/>>>>:/>>>>>/g" | perl -pe "s/>>>>>:/>>>>>>/g" | perl -pe "s/>>>>>>:/>>>>>>>/g" \ > mediawiki7 cat mediawiki7 | perl -pe "s/
//g" |
  perl -pe "s/<\/pre>/<\/code>/g" \
  > mediawiki8

cat mediawiki8 > dokuwiki